智能论文笔记

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

Huaizheng Zhang , Yuanming Li , Wencong Xiao , Yizheng Huang , Xing Di , Jianxiong Yin , Simon See , Yong Luo , Chiew Tong Lau , Yang You

分类：机器学习

2023-01-01

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.

translated by 谷歌翻译

Ponder: Point Cloud Pre-training via Neural Rendering

Di Huang , Sida Peng , Tong He , Xiaowei Zhou , Wanli Ouyang

分类：计算机视觉

2022-12-31

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.

translated by 谷歌翻译

Label-Efficient Interactive Time-Series Anomaly Detection

Hong Guo , Yujing Wang , Jieyu Zhang , Zhengjie Lin , Yunhai Tong , Lei Yang , Luoxing Xiong , Congrui Huang

分类：机器学习 | 人工智能

2022-12-30

Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.

translated by 谷歌翻译

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

Chenxi Huang , Tong He , Haidong Ren , Wenxiao Wang , Binbin Lin , Deng Cai

分类：计算机视觉

2022-12-20

Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due to its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity. Specifically, objects with different depths can appear with the same bounding boxes and similar visual features in the 2D image. Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training. To facilitate depth learning, we propose a simple yet effective plug-and-play module, One Bounding Box Multiple Objects (OBMO). Concretely, we add a set of suitable pseudo labels by shifting the 3D bounding box along the viewing frustum. To constrain the pseudo-3D labels to be reasonable, we carefully design two label scoring strategies to represent their quality. In contrast to the original hard depth labels, such soft pseudo labels with quality scores allow the network to learn a reasonable depth range, boosting training stability and thus improving final performance. Extensive experiments on KITTI and Waymo benchmarks show that our method significantly improves state-of-the-art monocular 3D detectors by a significant margin (The improvements under the moderate setting on KITTI validation set are $\mathbf{1.82\sim 10.91\%}$ mAP in BEV and $\mathbf{1.18\sim 9.36\%}$ mAP in 3D}. Codes have been released at https://github.com/mrsempress/OBMO.

translated by 谷歌翻译

Convolution-enhanced Evolving Attention Networks

Yujing Wang , Yaming Yang , Zhuo Li , Jiangang Bai , Mingliang Zhang , Xiangtai Li , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

分类：机器学习 | 自然语言处理 | 计算机视觉 | 神经与进化计算

2022-12-16

Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention

translated by 谷歌翻译

Frozen CLIP Model is Efficient Point Cloud Backbone

Xiaoshui Huang , Sheng Li , Wentao Qu , Tong He , Yifan Zuo , Wanli Ouyang

分类：计算机视觉

2022-12-08

The pretraining-finetuning paradigm has demonstrated great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field since the training data is limited and point cloud collection is expensive. This paper introduces \textbf{E}fficient \textbf{P}oint \textbf{C}loud \textbf{L}earning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model. Our EPCL connects the 2D and 3D modalities by semantically aligning the 2D features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a sequence of tokens and directly fed into the frozen CLIP model to learn point cloud representation. Furthermore, we design a task token to narrow the gap between 2D images and 3D point clouds. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the 2D CLIP model can be an efficient point cloud backbone and our method achieves state-of-the-art accuracy on both real-world and synthetic downstream tasks. Code will be available.

translated by 谷歌翻译

Reconstructing Hand-Held Objects from Monocular Video

Di Huang , Xiaopeng Ji , Xingyi He , Jiaming Sun , Tong He , Qing Shuai , Wanli Ouyang , Xiaowei Zhou

分类：计算机视觉

2022-11-30

This paper presents an approach that reconstructs a hand-held object from a monocular video. In contrast to many recent methods that directly predict object geometry by a trained network, the proposed approach does not require any learned prior about the object and is able to recover more accurate and detailed object geometry. The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker. Then, the object geometry can be recovered by solving a multi-view reconstruction problem. We devise an implicit neural representation-based method to solve the reconstruction problem and address the issues of imprecise hand pose estimation, relative hand-object motion, and insufficient geometry optimization for small objects. We also provide a newly collected dataset with 3D ground truth to validate the proposed approach.

translated by 谷歌翻译

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang , Jifeng Dai , Zhe Chen , Zhenhang Huang , Zhiqi Li , Xizhou Zhu , Xiaowei Hu , Tong Lu , Lewei Lu , Hongsheng Li

分类：计算机视觉

2022-11-10

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved the new record 65.4 mAP on COCO test-dev. The code will be released at https://github.com/OpenGVLab/InternImage.

translated by 谷歌翻译

PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference

Hongwu Peng , Shanglin Zhou , Yukui Luo , Shijin Duan , Nuo Xu , Ran Ran , Shaoyi Huang , Chenghong Wang , Tong Geng , Ang Li

分类：机器学习

2022-09-20

深度学习（DL）的快速增长和部署目睹了新兴的隐私和安全问题。为了减轻这些问题，已经讨论了安全的多方计算（MPC），以实现隐私保护DL计算。在实践中，它们通常是在很高的计算和沟通开销中，并有可能禁止其在大规模系统中的受欢迎程度。两种正交研究趋势吸引了人们对安全深度学习的能源效率的巨大兴趣，即MPC比较方案的高架降低和硬件加速度。但是，他们要么达到较低的减少比率，因此由于计算和通信节省有限而遭受了高潜伏期，或者是渴望的，因为现有的作品主要集中在CPU和GPU等一般计算平台上。在这项工作中，作为第一次尝试，我们通过将加密构件构建块的硬件延迟整合到DNN损耗功能中，以实现高能量效率，开发了一个系统的polympcnet，以减少MPC比较协议和硬件加速的联合额外降低的系统框架Polympcnet。和安全保证。我们的关键设计原理不是在DNN进行良好训练之后（通过删除或删除某些非物质操作员）训练（通过删除或删除某些非物质操作员）之后检查模型敏感性，而是要准确地执行DNN设计中的假设 - 培训DNN既是DNN都硬件有效且安全，同时逃脱了当地的最小值和鞍点并保持高精度。更具体地说，我们提出了通过多项式激活初始化方法直接提出的加密硬件友好的可训练多项式激活功能，以替代昂贵的2P-RELU操作员。我们开发了一个密码硬件调度程序和现场可编程门阵列（FPGA）平台的相应性能模型。

translated by 谷歌翻译

Towards Sparsification of Graph Neural Networks

Hongwu Peng , Deniz Gurevin , Shaoyi Huang , Tong Geng , Weiwen Jiang , Omer Khan , Caiwen Ding

分类：机器学习

2022-09-11

随着实际图表的扩大，将部署具有数十亿个参数的较大GNN模型。此类模型中的高参数计数使图表的训练和推断昂贵且具有挑战性。为了降低GNN的计算和记忆成本，通常采用了输入图中的冗余节点和边缘等优化方法。但是，直接针对模型层稀疏的模型压缩，主要限于用于图像分类和对象检测等任务的传统深神网络（DNN）。在本文中，我们利用两种最先进的模型压缩方法（1）训练和修剪以及（2）稀疏训练GNN中的重量层。我们评估并比较了两种方法的效率，从精确性，训练稀疏性和现实世界图上的训练拖失lop方面。我们的实验结果表明，在IA-Email，Wiki-Talk和Stackoverflow数据集上，用于链接预测，稀疏训练和较低的训练拖失板可以使用火车和修剪方法达到可比的精度。在用于节点分类的大脑数据集上，稀疏训练使用较低的数字插槽（小于1/7的火车和修剪方法），并在极端模型的稀疏性下保留了更好的精度性能。

translated by 谷歌翻译